Overview

Dataset info

Number of variables17
Number of observations1048574
Missing cells269891 (1.5%)
Duplicate rows56865 (5.4%)
Total size in memory136.0 MiB
Average record size in memory136.0 B

Variables types

Numeric6
Categorical8
Boolean1
Date0
URL0
Text (Unique)0
Rejected1
Unsupported1

Warnings

Dataset has 56865 (5.4%) duplicate rows Warning
Country has a high cardinality: 174 distinct values Warning
Crime_Level_in_the_City_of_Employement has 31453 (3.0%) zeros Zeros
Gender has 74127 (7.1%) missing values Missing
Hair_Color has 70211 (6.7%) missing values Missing
Profession has a high cardinality: 1355 distinct values Warning
Satisfation_with_employer has 38087 (3.6%) missing values Missing
University_Degree has 80600 (7.7%) missing values Missing
Work_Experience_in_Current_Job_[years] is an unsupported type, check if it needs cleaning or further analysis Warning
Year_of_Record is highly correlated with Instance (ρ = 0.9999164589) Rejected
Yearly_Income_in_addition_to_Salary_(e.g._Rental_Income) has a high cardinality: 88309 distinct values Warning

Variables

Age
Numeric

Distinct count109
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean37.33747737
Minimum14
Maximum125
Zeros (%)0.0%
Mini histogram

Quantile statistics

Minimum14
5-th percentile16
Q124
Median35
Q348
95-th percentile67
Maximum125
Range111
Interquartile range24

Descriptive statistics

Standard deviation15.99811038
Coef of variation0.4284732528
Kurtosis0.07511692036
Mean37.33747737
MAD13.07147022
Skewness0.7004130848
Sum39151108
Variance255.9395358
Memory size8.0 MiB
Histogram
Histogram with fixed size bins (bins=50)
Histogram
Histogram with variable size bins (bins=[ 14. 26.5 31.5 35.5 37.5 ... 103.5 104.5 110.5 117.5 125. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
20 25928 2.5%
 
16 25586 2.4%
 
15 25522 2.4%
 
22 25438 2.4%
 
24 25396 2.4%
 
23 25337 2.4%
 
18 25283 2.4%
 
26 25163 2.4%
 
17 25106 2.4%
 
21 25081 2.4%
 
Other values (99) 794734 75.8%
 

Minimum 5 values

ValueCountFrequency (%) 
14 12679 1.2%
 
15 25522 2.4%
 
16 25586 2.4%
 
17 25106 2.4%
 
18 25283 2.4%
 

Maximum 5 values

ValueCountFrequency (%) 
125 3 < 0.1%
 
122 2 < 0.1%
 
121 1 < 0.1%
 
119 2 < 0.1%
 
118 2 < 0.1%
 

Body_Height_[cm]
Numeric

Distinct count179
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean175.1619895
Minimum82
Maximum272
Zeros (%)0.0%
Mini histogram

Quantile statistics

Minimum82
5-th percentile144
Q1160
Median174
Q3190
95-th percentile208
Maximum272
Range190
Interquartile range30

Descriptive statistics

Standard deviation19.92929899
Coef of variation0.1137763909
Kurtosis-0.3923540455
Mean175.1619895
MAD16.41314719
Skewness0.0819166801
Sum183670308
Variance397.1769584
Memory size8.0 MiB
Histogram
Histogram with fixed size bins (bins=50)
Histogram
Histogram with variable size bins (bins=[ 82. 98.5 102.5 110.5 114.5 ... 240.5 245.5 250.5 258.5 272. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
167 19098 1.8%
 
169 19022 1.8%
 
166 18952 1.8%
 
172 18871 1.8%
 
173 18785 1.8%
 
163 18767 1.8%
 
165 18753 1.8%
 
170 18691 1.8%
 
168 18632 1.8%
 
164 18605 1.8%
 
Other values (169) 860398 82.1%
 

Minimum 5 values

ValueCountFrequency (%) 
82 1 < 0.1%
 
83 1 < 0.1%
 
90 1 < 0.1%
 
91 1 < 0.1%
 
94 1 < 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
272 1 < 0.1%
 
269 1 < 0.1%
 
268 1 < 0.1%
 
266 1 < 0.1%
 
265 1 < 0.1%
 

Country
Categorical

Distinct count174
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
Honduras
 
111155
Switzerland
 
19881
Togo
 
19652
Other values (171)
897886
ValueCountFrequency (%) 
Honduras 111155 10.6%
 
Switzerland 19881 1.9%
 
Togo 19652 1.9%
 
Israel 19548 1.9%
 
Austria 19449 1.9%
 
Serbia 19258 1.8%
 
Tajikistan 19134 1.8%
 
Laos 19036 1.8%
 
Papua New Guinea 19024 1.8%
 
Sierra Leone 18937 1.8%
 
Other values (164) 763500 72.8%
 
Max length24
Mean length8.371749633
Min length1
Contains charsTrue
Contains digitsTrue
Contains spacesTrue
Contains non-wordsTrue

Crime_Level_in_the_City_of_Employement
Numeric

Distinct count202
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean83.47229475
Minimum0
Maximum206
Zeros (%)3.0%
Mini histogram

Quantile statistics

Minimum0
5-th percentile5
Q140
Median84
Q3125
95-th percentile164
Maximum206
Range206
Interquartile range85

Descriptive statistics

Standard deviation50.03143793
Coef of variation0.5993777705
Kurtosis-1.094413831
Mean83.47229475
MAD42.97572014
Skewness0.05160594991
Sum87526878
Variance2503.144781
Memory size8.0 MiB
Histogram
Histogram with fixed size bins (bins=50)
Histogram
Histogram with variable size bins (bins=[ 0. 2. 4.5 5.5 6.5 ... 196.5 198.5 199.5 201.5 206. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 31453 3.0%
 
5 15900 1.5%
 
125 9711 0.9%
 
6 9115 0.9%
 
16 8746 0.8%
 
130 8565 0.8%
 
117 8539 0.8%
 
136 8279 0.8%
 
86 8252 0.8%
 
120 8151 0.8%
 
Other values (192) 931863 88.9%
 

Minimum 5 values

ValueCountFrequency (%) 
0 31453 3.0%
 
4 6502 0.6%
 
5 15900 1.5%
 
6 9115 0.9%
 
7 82 < 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
206 1 < 0.1%
 
203 17 < 0.1%
 
202 1 < 0.1%
 
201 34 < 0.1%
 
200 17 < 0.1%
 

Gender
Categorical

Distinct count7
Unique (%)< 0.1%
Missing (%)7.1%
Missing (n)74127
male
400113
other
252053
female
237489
Other values (3)
84792
(Missing)
 
74127
ValueCountFrequency (%) 
male 400113 38.2%
 
other 252053 24.0%
 
female 237489 22.6%
 
unknown 62957 6.0%
 
f 15031 1.4%
 
0 6804 0.6%
 
(Missing) 74127 7.1%
 
Max length7
Mean length4.740310174
Min length1
Contains charsTrue
Contains digitsTrue
Contains spacesFalse
Contains non-wordsFalse

Hair_Color
Categorical

Distinct count7
Unique (%)< 0.1%
Missing (%)6.7%
Missing (n)70211
Black
400652
Blond
253993
Brown
253958
Other values (3)
 
69760
(Missing)
 
70211
ValueCountFrequency (%) 
Black 400652 38.2%
 
Blond 253993 24.2%
 
Brown 253958 24.2%
 
Red 63281 6.0%
 
Unknown 6279 0.6%
 
0 200 < 0.1%
 
(Missing) 70211 6.7%
 
Max length7
Mean length4.756597055
Min length1
Contains charsTrue
Contains digitsTrue
Contains spacesFalse
Contains non-wordsFalse

Housing_Situation
Categorical

Distinct count10
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
Large House
166291
Medium House
138527
Castle
124551
Other values (7)
619205
ValueCountFrequency (%) 
Large House 166291 15.9%
 
Medium House 138527 13.2%
 
Castle 124551 11.9%
 
Large Apartment 124341 11.9%
 
nA 124052 11.8%
 
Small House 123710 11.8%
 
Medium Apartment 97547 9.3%
 
0 76481 7.3%
 
0 65536 6.3%
 
Small Apartment 7538 0.7%
 
Max length16
Mean length9.087292838
Min length1
Contains charsTrue
Contains digitsTrue
Contains spacesTrue
Contains non-wordsTrue

Instance
Numeric

Distinct count991709
Unique (%)94.6%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean495114.1148
Minimum1
Maximum991709
Zeros (%)0.0%
Mini histogram

Quantile statistics

Minimum1
5-th percentile52429.65
Q1262144.25
Median489008.5
Q3729565.75
95-th percentile939280.35
Maximum991709
Range991708
Interquartile range467421.5

Descriptive statistics

Standard deviation278454.8987
Coef of variation0.5624054947
Kurtosis-1.097939364
Mean495114.1148
MAD235415.1237
Skewness0.007948865954
Sum5.191637878e+11
Variance7.753713059e+10
Memory size8.0 MiB
Histogram
Histogram with fixed size bins (bins=50)
Histogram
Histogram with variable size bins (bins=[1.000000e+00 4.537295e+05 5.105745e+05 5.995575e+05 5.995625e+05 9.917090e+05], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
599558 5 < 0.1%
 
599560 5 < 0.1%
 
599562 5 < 0.1%
 
599559 5 < 0.1%
 
599561 5 < 0.1%
 
483177 2 < 0.1%
 
481128 2 < 0.1%
 
489625 2 < 0.1%
 
487576 2 < 0.1%
 
460951 2 < 0.1%
 
Other values (991699) 1048539 > 99.9%
 

Minimum 5 values

ValueCountFrequency (%) 
1 1 < 0.1%
 
2 1 < 0.1%
 
3 1 < 0.1%
 
4 1 < 0.1%
 
5 1 < 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
991709 1 < 0.1%
 
991708 1 < 0.1%
 
991707 1 < 0.1%
 
991706 1 < 0.1%
 
991705 1 < 0.1%
 

Profession
Categorical

Distinct count1355
Unique (%)0.1%
Missing (%)0.3%
Missing (n)2853
payment analyst
 
2374
permit records assistant
 
2220
postal service mail sorter
 
2211
Other values (1351)
1038916
(Missing)
 
2853
ValueCountFrequency (%) 
payment analyst 2374 0.2%
 
permit records assistant 2220 0.2%
 
postal service mail sorter 2211 0.2%
 
producer 2205 0.2%
 
photographer 2202 0.2%
 
patternmaker 2192 0.2%
 
parking enforcement officer 2179 0.2%
 
program and policy specialist 2177 0.2%
 
probation officer trainee 2171 0.2%
 
policy writer 2170 0.2%
 
Other values (1344) 1023620 97.6%
 
(Missing) 2853 0.3%
 
Max length67
Mean length20.50655748
Min length3
Contains charsTrue
Contains digitsTrue
Contains spacesTrue
Contains non-wordsTrue

Satisfation_with_employer
Categorical

Distinct count5
Unique (%)< 0.1%
Missing (%)3.6%
Missing (n)38087
Average
487634
Happy
351808
Somewhat Happy
155844
(Missing)
 
38087
ValueCountFrequency (%) 
Average 487634 46.5%
 
Happy 351808 33.6%
 
Somewhat Happy 155844 14.9%
 
Unhappy 15201 1.4%
 
(Missing) 38087 3.6%
 
Max length14
Mean length7.224060486
Min length3
Contains charsTrue
Contains digitsFalse
Contains spacesTrue
Contains non-wordsTrue

Size_of_City
Numeric

Distinct count615217
Unique (%)58.7%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean833354.543
Minimum15
Maximum49993331
Zeros (%)0.0%
Mini histogram

Quantile statistics

Minimum15
5-th percentile14602
Q172837
Median504480
Q31183643.5
95-th percentile2188388.75
Maximum49993331
Range49993316
Interquartile range1110806.5

Descriptive statistics

Standard deviation2135273.365
Coef of variation2.562262824
Kurtosis275.8176521
Mean833354.543
MAD746209.2716
Skewness15.22188227
Sum8.738339066e+11
Variance4.559392342e+12
Memory size8.0 MiB
Histogram
Histogram with fixed size bins (bins=50)
Histogram
Histogram with variable size bins (bins=[1.50000000e+01 4.70000000e+01 5.05000000e+01 1.00500000e+02 1.01500000e+02 ... 4.74889990e+07 4.74937625e+07 4.90803945e+07 4.90863550e+07 4.99933310e+07], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
61172 55 < 0.1%
 
23604 50 < 0.1%
 
38331 49 < 0.1%
 
56769 40 < 0.1%
 
30932 40 < 0.1%
 
7251 39 < 0.1%
 
94160 38 < 0.1%
 
35091 38 < 0.1%
 
60004 38 < 0.1%
 
86657 38 < 0.1%
 
Other values (615207) 1048149 > 99.9%
 

Minimum 5 values

ValueCountFrequency (%) 
15 2 < 0.1%
 
20 1 < 0.1%
 
23 1 < 0.1%
 
24 2 < 0.1%
 
30 1 < 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
49993331 2 < 0.1%
 
49988987 1 < 0.1%
 
49938600 1 < 0.1%
 
49929902 1 < 0.1%
 
49907324 1 < 0.1%
 

Total_Yearly_Income_[EUR]
Numeric

Distinct count837862
Unique (%)79.9%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean68509.7334
Minimum6.37
Maximum2548790.96
Zeros (%)0.0%
Mini histogram

Quantile statistics

Minimum6.37
5-th percentile1087.2665
Q15333.73
Median21890.815
Q386030.5075
95-th percentile285810.3035
Maximum2548790.96
Range2548784.59
Interquartile range80696.7775

Descriptive statistics

Standard deviation111929.8077
Coef of variation1.633779642
Kurtosis20.1184674
Mean68509.7334
MAD72749.63443
Skewness3.520711764
Sum7.183752519e+10
Variance1.252828186e+10
Memory size8.0 MiB
Histogram
Histogram with fixed size bins (bins=50)
Histogram
Histogram with variable size bins (bins=[6.37000000e+00 2.60000000e+01 4.36200000e+01 4.36500000e+01 7.16050000e+01 ... 1.05149878e+06 1.20139256e+06 1.41001710e+06 1.62520507e+06 2.54879096e+06], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
8254.17 48 < 0.1%
 
1077.61 35 < 0.1%
 
8923.22 34 < 0.1%
 
5609.92 34 < 0.1%
 
10915.56 34 < 0.1%
 
26355.16 34 < 0.1%
 
2083.8 34 < 0.1%
 
12170.64 34 < 0.1%
 
9327.26 34 < 0.1%
 
6615.41 33 < 0.1%
 
Other values (837852) 1048220 > 99.9%
 

Minimum 5 values

ValueCountFrequency (%) 
6.37 1 < 0.1%
 
6.92 1 < 0.1%
 
8.5 1 < 0.1%
 
8.66 1 < 0.1%
 
8.72 1 < 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
2548790.96 1 < 0.1%
 
2391657.56 1 < 0.1%
 
2332659.88 1 < 0.1%
 
2319623.71 1 < 0.1%
 
2285303.38 1 < 0.1%
 

University_Degree
Categorical

Distinct count6
Unique (%)< 0.1%
Missing (%)7.7%
Missing (n)80600
Bachelor
396585
No
251689
Master
250475
Other values (2)
 
69225
(Missing)
80600
ValueCountFrequency (%) 
Bachelor 396585 37.8%
 
No 251689 24.0%
 
Master 250475 23.9%
 
PhD 62823 6.0%
 
0 6402 0.6%
 
(Missing) 80600 7.7%
 
Max length8
Mean length5.355443679
Min length1
Contains charsTrue
Contains digitsTrue
Contains spacesFalse
Contains non-wordsFalse

Wears_Glasses
Boolean

Distinct count2
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
1
524729
0
523845
ValueCountFrequency (%) 
1 524729 50.0%
 
0 523845 50.0%
 

Work_Experience_in_Current_Job_[years]
Unsupported

This variable is an unsupported type, check if it needs cleaning or further analysis

Unsupported value

Year_of_Record
Highly correlated

This variable is highly correlated with Instance and should be ignored for analysis

Correlation0.9999164589

Yearly_Income_in_addition_to_Salary_(e.g._Rental_Income)
Categorical

Distinct count88309
Unique (%)8.4%
Missing (%)0.0%
Missing (n)0
0 EUR
946048
71546.49 EUR
 
32
81679.37 EUR
 
32
Other values (88306)
 
102462
ValueCountFrequency (%) 
0 EUR 946048 90.2%
 
71546.49 EUR 32 < 0.1%
 
81679.37 EUR 32 < 0.1%
 
51445.76 EUR 32 < 0.1%
 
125493.97 EUR 32 < 0.1%
 
137555.01 EUR 32 < 0.1%
 
46105.44 EUR 32 < 0.1%
 
21019.6 EUR 32 < 0.1%
 
30856.63 EUR 32 < 0.1%
 
20618.14 EUR 32 < 0.1%
 
Other values (88299) 102238 9.8%
 
Max length13
Mean length5.688788774
Min length5
Contains charsTrue
Contains digitsTrue
Contains spacesTrue
Contains non-wordsTrue

Correlations

Missing values

Sample

First rows

AgeBody_Height_[cm]CountryCrime_Level_in_the_City_of_EmployementGenderHair_ColorHousing_SituationInstanceProfessionSatisfation_with_employerSize_of_CityTotal_Yearly_Income_[EUR]University_DegreeWears_GlassesWork_Experience_in_Current_Job_[years]Year_of_RecordYearly_Income_in_addition_to_Salary_(e.g._Rental_Income)
045182Afghanistan33otherBlack01group headUnhappy251796182.05No1171940.00 EUR
117172Afghanistan25femaleBlond02heavy vehicle and mobile equipment service technicianUnhappy22782046819.69No04.91940.00 EUR
248144Afghanistan34femaleBlond03sorterUnhappy8221348663.53Bachelor0211940.00 EUR
342152Albania70femaleBrown04quality control senior engineerAverage594772400.64No1181940.00 EUR
415180Albania51otherBlack05logisticianHappy234942816.18Master181940.00 EUR
526212Albania61maleBrown06unix/linux systems leadAverage306242572.16Bachelor1151940.00 EUR
622181Albania58maleBlack07purchasing agentAverage2880223336.93Bachelor1121940.00 EUR
715161Albania51femaleNaN08quality management specialistAverage15953183679.14Bachelor06.31940.00 EUR
837168Albania68maleBlack09investment officerHappy821142666.37Bachelor1151940.00 EUR
925186Albania60NaNBrown010riggerAverage20648993898.08Bachelor0131940.00 EUR

Last rows

AgeBody_Height_[cm]CountryCrime_Level_in_the_City_of_EmployementGenderHair_ColorHousing_SituationInstanceProfessionSatisfation_with_employerSize_of_CityTotal_Yearly_Income_[EUR]University_DegreeWears_GlassesWork_Experience_in_Current_Job_[years]Year_of_RecordYearly_Income_in_addition_to_Salary_(e.g._Rental_Income)
104856445156Honduras165NaNBlackMedium House991700procurement specialistAverage9921138531.730119NaN0 EUR
104856561187Honduras178NaNBrownSmall House991701plumber's helperSomewhat Happy177001884943.15No124NaN44734.25 EUR
104856645153Honduras165NaNBlacknA991702strategic account managerAverage588192337.20Master120NaN0 EUR
104856721174Honduras132NaNBlackLarge Apartment991703staff analyst iiAverage76316933022.45No112NaN0 EUR
104856858157Honduras176NaNBlackMedium Apartment991704project managerAverage331583696531.37Bachelor123NaN0 EUR
104856932188Honduras150NaNBlondLarge Apartment991705motor vehicle operatorHappy54925141621.08PhD114NaN0 EUR
104857020197Honduras130NaNBlack0991706hiring plan analystSomewhat Happy11265461744.25Bachelor09NaN0 EUR
104857122178Honduras134NaNBlondMedium Apartment991707procurement contracting officerAverage90978254738.53Bachelor011NaN61353.98 EUR
104857254210Honduras173NaNBrownCastle991708kindergarten teacherSomewhat Happy120759111182.98Master021NaN0 EUR
104857337195Honduras156NaNBrown0991709Assistant NurseAverage133621280.21Master019NaN0 EUR